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Ch ■ Abstract 

A new O(A^) algorithm based on a recursion method, in which the com- 

^ putational effort is proportional to the number of atoms A^, is presented for 

OO 

T^lj- ' calculating the inverse of an overlap matrix which is needed in electronic 

o ■ 

^3 ' structure calculations with the the non-orthogonal localized basis set. This 

efficient inverting method can be incorporated in several O(A^) methods for 
' diagonalization of a generalized secular equation. By studying convergence 

B' 

I ' properties of the 1-norm of an error matrix for diamond and fee Al, this 

o 
o 



method is compared to three other 0(A) methods (the divide method, Tay- 
lor expansion method, and Hotelling's method) with regard to computational 



^ ' accuracy and efficiency within the density functional theory. The test cal- 



culations show that the new method is about one-hundred times faster than 
the divide method in computational time to achieve the same convergence for 
both diamond and fee Al, while the Taylor expansion method and Hotelling's 
method suffer from numerical instabilities in most cases. 



The development of O(A^) methods p|-p!0|] and the revival of localized orbitals as a basis 
set []n]-0 have been made during the last decade in order to extend the applicability of the 
first-principles molecular dynamics (FPMD) simulations using the plane wave expansion 
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and the Car-Parrinello method within density functional theories (DFT) pO[. However, 
only few applications of these O(A^) methods to large systems have been reported within 
the DFT calculations p!8|pl| , |22[] . Although there are a lot of limitations of the method based 
on the localized description one of the limitations is that several O(A^) methods require 
evaluating the inverse of the overlap matrix S which comes from non-orthogonality among 
the localized orbitals. 

In the generalized Fermi operator expansion (FOE) method [§] to the non-orthogonal 
basis we need to calculate the inverse of overlap matrix to construct the modified Hamiltonian 
H' = S~^H, while Stephan et al. have proposed solving a linear equation SH' = H with 
the cutoff radii of H instead of calculating the inverse of overlap matrix. In the density 
matrix (DM) method [p|-[10[| which is a promising approach for materials with a wide gap, 



fortunately, the evaluation of the inverse is not required during the optimization of grand 
potentials, although we have to evaluate the inverse of the overlap matrix for a good initial 
guess of the density matrix [|1^]. The block bond-order potential (BOP) method 0, which 
has good convergence properties for both insulators and metals, also requires the evaluation 
of the modified Hamiltonian H' as in method the FOE method. If the overlap matrix is 
sparse, the computational cost scales as the second power of the number of atoms in the 
inverse calculation. Therefore, an efficient O(A^) method for inverting the overlap matrix 
should be developed. 

So far, several O(A^) inverting methods have been proposed. Gibson et al. used a simple 
method in which a linear equation SH' = H constructed for a finite cluster is solved without 
explicit calculation of S'^ [^. Mauri et al. considered approximating the inverse of overlap 
matrix by the Taylor expansion 0. The approach could be an O(A^) inverting method 
when the matrix elements in the pth moment of the overlap matrix O are cut at a finite 
distance. Falser and Manolopoulos proposed to evaluate the inverse by Hotelling's method 
which is similar to the iterative purification algorithm of the DM method |T0[ . The iterative 
calculation can be performed in O(A^) operations, provided that the cutoff of matrix elements 
at a finite distance is introduced in the product of two matrices. It is worth pointing out that 



the ideas of these 0{N) inverting methods are analogous to those of the 0{N) methods for 
the diagonahzation. The divide method by Gibson et al. I^S], the Taylor expansion method 



0, and Hotelling's method [|I0| strategically and mathematically correspond to the divide 
and conquer method p, the FOE method 0,^, and the DM method |P^IO|, respectively. 
Therefore, one may expect that these O(A^) inverting methods may have the convergence 



properties for realistic materials similar to the 0{N) methods for the diagonahzation P4 
However, it remains to be seen whether the expectation is meaningful or not. 

In this paper we propose a new O(A^) method for calculating the inverse of the overlap 
matrix which is based on a resolvent and the block Lanczos algorithm. The new method 
is compared with the other three methods in terms of the computational accuracy and 
efficiency. Thus, our aim of this paper is to clarify the applicability of these four O(A^) 
inverting methods for realistic materials. The paper is organized as follows. In Sec. II we 
present the theory of a new O(A^) inverting method based on a recursion method, and also 
summarize the three other 0{N) inverting methods. In Sec. Ill we discuss the convergence 
properties of these four O(A^) inverting methods for the diamond and fee Al within the DFT 
calculations using the 1-norm of an error matrix which will be related to the error in the 
eigenvalues in this section. In Sec. IV we conclude with clear characterization of the four 
O(A^) inverse methods. 

II. THEORY 

A. Recursion method 

It is assumed that one-particle wave functions are expanded by a localized orbital basis 
set {\ia)), where i is a site index and a is an orbital index. The localized orbitals could 
be Slater-type |T^-|T3[, Gaussian-type [O, and numerical orbitals [ini'H obtained by DFT 



calculations for atoms. In most cases, the orbitals are non-orthogonal between them, leading 
to an overlap matrix S defined by 

Sia,j(3 = {ia\S\j/3), (1) 



where S is the overlap operator which is introduced as a matter of form in order to emphasize 
the similarity between the new inverting method and the block BOP method 0, although 
the overlap operator generally should be the identity operator I. The overlap integral ex- 
ponentially decays in real space because of the localized nature of the orbitals, so that the 
overlap matrix S is sparse. Here we introduce a resolvent R{Z) for the matrix S as follows: 

R{Z) = {S - Zl)-\ (2) 

It is then easy to verify that 

S-^ = ReR{0). (3) 

Thus, we see that the real part of the resolvent for Z = gives the inverse S^^ of the overlap 
matrix. If the resolvent for Z = has a finite value for the imaginary part, the basis set 
is not linearly independent. The resolvent can be evaluated by adopting the algorithm of 
the block BOP method which is recently developed to simulate orthogonal tight-binding 
(TB) models in O(A^) operations. It is noted that the new inverting method is derived just 
by replacing the Hamiltonian H in the block BOP method within the orthogonal TB models 
with the overlap operator S. The first step in this algorithm is to block-tridiagonalize the 
overlap matrix S using the block Lanczos algorithm p5|^8|. The central equations is 



Sm = + \Un-i)B^ + \Un+i)B^+i (4) 

with 

|f/o) = (Kl),|22),...,|2M,)) (5) 

as the starting state. and S„ are recursion block coefficients with Mj x Mj in size, 
where Mj is the number of localized orbitals on the starting atom i, and the underline 
indicates that the element is a block. In the block Lanczos algorithm, we need to start 
the recursion with Eq. (5) to make the recursion method accurate and efficient The 
Lanczos algorithm with a finite recursion transforms the overlap matrix S into the block- 
tridiagonalized matrix which has the diagonal An and the sub-diagonal block elements 



Bn, where the index L indicates the representation based on the Lanczos basis. Considering 
the resolvent R^{Z) = {S^ — ZT)^^ for the block-tridiagonahzed overlap matrix, the diagonal 
Rqq{Z) and off-diagonal block elements ^q^(Z) can be easily derived along the same line as 
that described in the block BOP method 0. For Z = 0, the elements are given by 

Eo^o(o) = [Ao - WAi - 'B,[. ■ r'm-'m-', (6) 

Eon(0)= (<^lJ-EoVl(OUn-l 

-RL-2i0yB^-i)iBn)-\ (7) 

where S is Kronecker's delta, and -Ro-i(O) = ^Bq = 0. Once the block diagonal element 
is calculated as the multiple inverse Eq. (6), the off-diagonal elements are evaluated from 
the recurrence relation Eq. (7) with Roq{0) as the starting element. In order to truncate 
the multiple inverse in Eq. (6) without reducing the accuracy significantly, a square root 
terminator could be used, while there could be an infinite number of levels in the multiple 
inverse of diagonal Green's function for an infinite system. In the test calculations of Sec. Ill 
we used the square root temninator for the truncation at a finite number of levels. The two 
Eqs. (6) and (7) provide the resolvent based on the Lanczos basis representation, so that we 
can obtain the original resolvent through the following inverse transformation: 

E.,(0)=E^On(0)Vni, (8) 

n 

where ^H^j is defined by *f/„j = (t/„|(|jl), |j2), . . . , |jMj)). The inverse transformation 
Eq. (8) is significantly simplified because of the orthogonality in the Lanczos bases. There- 
fore, we only have to evaluate the 0th block line of the resolvent in the Lanczos basis 
representation. The resolvent exactly satisfies a sum rule Z^jj ti' {SijRji(0)| = A''^ which is 
derived from Eq. (2), where Nb is the number of bases, and is constructed by up to (q+l)th 
moments 5"^^^ 0, where g is a final level for the recursion. Equation (8) gives a good 
approximation for the inverse of overlap matrix as the number of recursion levels increases. 
However, the approximated inverse is not strictly a symmetric matrix at a finite recursion. 
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If the approximated inverse is symmetric, eigenvalues of a generalized secular equation with 
the overlap matrix are real numbers. Therefore, we evaluate the inverse of overlap matrix 
by symmetrizing the resolvent in terms of simple arithmetic average: 



_ ReE,,(0) + Re%,(0) 
^ij 2 ■ ^ 

The symmetrization preserves the above sum rule. The all elements of the inverse are 
evaluated by applying the series of the algorithm repeatedly to each atom. The cluster over 
which the hops are made in the Lanczos algorithm is determined by the logical truncation 
method 0. Thus, the computational cost of the recursion method is strictly proportional 
to the number of atoms A^. 

B. Divide method 

In the case of the block BOP [H and FOE methods it is required to evaluate the 
modified Hamiltonian H' = S^^H rather than the inverse of overlap matrix. In such cases 
we have an alternative way where a linear equation 

SH' = H (10) 

is solved instead of calculating the inverse. In conventional ways of solving the linear equation 
for a total system, the computational cost scales as the third power of the number of atoms 
A^, while the scaling could be reduced to 0(A^^), making use of the sparseness of the overlap 
matrix. Therefore, Gibson et al. have proposed a solution of Eq. (10) with the cutoff radii 



of H and 5" |2^. The linear equation Eq. (10) can be decomposed into subspace hnear 
equations for finite clusters under this constraint. One solves each of the subspace linear 
equations for the finite clusters centered on atom i using a conventional method such as 
the Cholesky factorization, which results in O(A^) operations for the computational effort. 
However, the divide method has redundancy in the calculation that one has to evaluate 
all matrix elements of the modified Hamiltonian H' for each finite cluster compared to the 
other O(A^) inverting methods in which the elements in the inverse of the overlap matrix are 
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not doubly calculated. Thus, the prefactor of the O(A^) operations could be very large for 
highly coordinated structures such as fee. The magnitude of the prefactor will be discussed 
in Sec. III. An iterative scheme such as the Gauss-Siedel method [0,^] which is commonly 
used for large-scale systems is also available for solving the linear equation Eq. (10). However, 
it has been recognized that the iterative scheme is computationally expensive [^, so that 
the iterative scheme was not investigated in this study. We used the logical truncation 
method to construct the subspace linear equation as well as the recursion method in the 
test calculations discussed in Sec. Ill in order to compare the computational performance. 

C. Taylor expansion method 

Mauri et al. have proposed to approximate the inverse of the overlap matrix using the 
Taylor expansion in their O(A^) unconstrained minimization method [0]. The overlap matrix 
S is expressed as a sum of the identity I and an 0-matrix O which is the overlap matrix 
between the different orbitals: 

^ = 1 + 0, (11) 
then we can expand the inverse of S in respect to the 0-matrix as follows: 

oo 

S-^ = ^(-i)"0" 

n=0 

= 1-0 + 0^-0^ + ... (12) 

The computational accuracy and efficiency of the approximation by the Taylor series depend 
on the convergence for the summation of Eq. (12). The summation in Eq. (12) does not 
converge, but diverges, when the spectrum radius of the 0-matrix exceeds 1.0. Even if the 
0-matrix has no eigenvalues which are and below -1.0, indicating the basis set is linearly 
independent, the eigenvalues of the 0-matrix exceed 1.0 in most cases as shown in Sec. III. 
In such cases, the Taylor expansion method cannot be applied. The matrix O" is calculated 
as the product of the perfect but highly sparse 0-matrix, and O"^^ with the cutoff radii 
for the elements, so that the summation to a finite order in Eq. (12) can be performed with 
O(A^) operations. 
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D. Hotelling's method 

Falser and Manolopoulos |]T0[ have suggested evaluating the inverse using Hotelling's 



method |30j3T|]. The method has an iterative algorithm very similar to the purification 



algorithm |Ty] in the DM method. The convergence rate in Hotelling's method is also 
quadratic as with the DM method. The purification of an approximate inverse is achieved 
using the following iterative relation: 

^n+l = ^ — ^SS^ ^. (13) 

In case of Sq^ = I, Hotelling's method is equivalent to the Taylor expansion method to a 
finite order described in the previous subsection (C). It is easy to verify that 5*1 and S2 are 
the Taylor series to the first and third orders of the 0-matrix, respectively: 



c— 1 oc— 1 c— Ice— 1 

Oj^ — 

= 1-0, (14) 

Q — 1 r\Q — l Q— ICC— 1 

= 1-0 + 0^-0"^. (15) 

From Eqs. (14) and (15), we see that Hotelling's method converges quadratically compared to 
the linear convergence of Taylor expansion method. Thus, if Eq. (12) is a convergent series, 
Hotelling's method should be more efficient rather than the Taylor expansion method. When 
the spectrum radius of the 0-matrix exceeds 1.0, the identity 1 cannot be used as the initial 
guess for the inverse S~^. In such cases, although it is very difficult to estimate a good 
initial matrix Sq^ for the iteration Eq. (13), in this study, we use the overlap S with a small 
prefactor a derived by Fan and Reif |31| as the initial guess: 



S^^ = aS (16) 

with 
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meLxJ2\Sia,jf3\ 



It is noted that Hotelling's method possesses an advantage that the inverse at the previous 
MD step could be a good guess of Sq^ at the current MD step, while any information at the 
previous MD step cannot be made use of in the other methods; the recursion method, the 
divide method, and the Taylor expansion method. In the iteration Eq. (13), the elements of 
S^^ are cut at a finite distance. As a result of this truncation, the computational effort of 
Hotelling's method scales linearly with the system size. In test calculations of Sec. Ill, we 
used the logical truncation method for the cutoff of the elements as in the other inverting 
O(A^) methods. 

III. CONVERGENCE PROPERTIES 
A. Error analysis 

In order to compare the four O(A^) inverse methods presented in the Sec. II in terms of 
computational accuracy and efficiency, we first relate the 1-norm of an error matrix E with 



the error of eigenvalues of a secular equation by using an error analysis theory ||32| , |33 
The generalized secular equation with the overlap matrix S is derived from the variational 
principle within DFT using a non-orthogonal basis set. 

S-^HC, = t^C, (18) 

where iiia,j^ = (^Q^l-f^li/?) and Cia,y is an expansion coefficient Ci^^ = in a one- 

particle wave function \(f)u)- Let us consider substituting the exact inverse with an 
approximate inverse S'~^ in Eq. (18), then the difference between S^^ and S'^^ is 

^'-1 - = A^-^ (19) 

For the approximate inverse S'~^ the secular equation S'~^HC[, = e'^Cl is satisfied with 
approximate eigenvalues and eigenvectors C^. According to the error analysis theory 
|3^ , |33[| , the difference between the exact and the approximate eigenvalues is given by 
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(20) 



with A, which is the 1-norm of a matrix AS ^H, defined by 



A = max 

313 ^ 



fc7 



(21) 



Therefore, we see that the error in eigenvalue is proportional to the 1-norm of AS~^H for 
the approximation of the overlap matrix. Equation (20) apparently connects the error of 
the overlap matrix to that of the eigenvalue. However, it is not possible to calculate the 
exact inverse for infinite or periodic systems, so that we introduce an error matrix E, which 
is easily evaluated, defined as the difference between a matrix SS'~^H and the original 
Hamiltonian H: 



E = SS'-^H - H 



(22) 



The 1-norm -q of the error matrix E can be related to that A of the matrix AS H as follows: 



T] — max ^ 



ioi ky 



< max^^ \ Sk'Y,ia\ 

k'Y ia 



,313 



kj 



< Nav max V 

3i3 ~r 



Yl ^^ia,k'yHki,jl3 
kj 



(23) 



where 7Va„ is the average number of the non-zero elements in the overlap matrix for an orbital 
\ia). The third relation in Eq. (23) is derived by substituting the non-zero overlap integrals 
\Sk''y,ia\ to 1 with the variables ia fixed in the second relation. Considering Eqs. (21) and 
(23), we can relate the 1-norm of the error matrix to the error of the eigenvalue: 



0(77). 



(24) 



Therefore, we will compare the four 0{N) inverse methods using the 1-norm t], which is 
easily evaluated, instead of A. 
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B. Numerical tests 



We numerically studied convergence properties of the four inverse 0{N) methods using 
1-norm r] for diamond and fee Al within DFT proposed by Sankey and Niklewski In 
this DFT calculations we used numerical localized orbitals, fireball bases by Sankey and 



Niklewski ||TT|], as a minimal basis set for valence electrons. The radii of the radial- wave 
function confinement are 2.1 and 3.7 A for carbon and aluminum atoms, respectively. The 
minimal basis sets give 1.253 (1.244) and 2.515 (2.466) A as an equilibrium bond length 
of dimer for carbon and aluminum, respectively, where the values in the parentheses are 
experimental results. 

In Fig. (1) we show the density of states for eigenvalues of O- matrix, which is defined by 
Eq. (11), in diamond and fee Al. In both cases the 0-matrices have no eigenvalues smaller 
than -1.0, so that the basis sets are linearly independent for the structures. However, the 
density of states possess finite values for the eigenvalues larger than or equal to 1.0 in both 
cases. In other words the spectrum radius of the 0-matrix exceeds 1.0. This means that the 
summation in Eq. (12) for the Taylor expansion method diverges for diamond and fee Al. In 
addition to the above cases, we confirmed that the spectrum radii of the 0-matrix also exceed 
1.0 for the graphite and poly(ethylene), so that the applicability of the Taylor expansion 
method is strictly restricted. Therefore, we do not provide the convergence properties of the 
Taylor expansion method in this paper. 

Figure 2 shows the convergence properties of the 1-norm r] of the error matrix for diamond 
calculated by the recursion, divide, and Hotelling's methods. In the recursion method the 
1-norm exponentially decays for each shell cluster as a function of the number of recursion 
levels, and finally converges to the value of the 1-norm calculated by the divide method for 
the corresponding cluster. In the divide method the 1-norm almost exponentially diminishes 
as a function of number of shells. For the seven-shell cluster the 1-norm is only 3.1 x 10~^ eV. 
The identity matrix I cannot be used as an initial guess Sq^ in Hotelling's method because 
the spectrum radii of the 0-matrix exceed 1.0. Thus, we gave the initial guess Sq^ by 
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Eq. (16), where a is 0.021 for diamond. In Hotelling's method the convergence properties 
are not monotonic compared to the other two methods. For three-, five-, and seven-shell 
clusters, the 1-norm is gradually reduced for smaller number of iterations. However, the 
1-norm increases after reaching at the minimum, and finally we have a numerical instability 
that the 1-norm diverges as iteration proceeds. The smallest 1-norm for each shell-cluster 
is slightly larger than that calculated by the divide method for the same cluster. Therefore, 
we see that HoteUing's method cannot reach the perfect convergence for diamond due to the 
numerical instability. For Hotelling's method we also examined the convergence properties 
of the 1-norm r\ for carbon in the diamond structure with 3.9 A of a lattice constant in which 
the spectrum radius of the 0-matrix is within 1.0, while the result is not shown in this paper. 
In this system the 1-norm very quickly converges to the corresponding value calculated by 
the divide method for the same cluster. Thus, we heuristically find that Hotelling's method 
gives convergent results for systems with the spectrum radii smaller than 1.0. 

As with Fig. 2, the convergence properties of the 1-norm are shown in Fig. 3 for fee 
Al. The magnitude of the 1-norm is 1~2 order larger than that of diamond, while the 
behavior of the 1-norm is very similar to that of diamond. In the recursion method the 
converged values of the 1-norm are consistent with those of the divide method for four- and 
six-shell clusters, respectively. In Hotelling's method we used Eq. (16) with a = 0.0098 
as Sq^, since the spectrum radius of the O-matrix exceed 1.0 for fee Al. The 1-norms for 
the four- and six-shell clusters finally diverge without achieving the full convergence like for 
diamond. Although we tested the convergence properties using several values for a in both 
diamond and fee Al, we could not obtain converged results and moreover could not avoid 
the numerical instability. 

Figures 4(a) and 4(b) show the relation between the magnitude of the 1-norm r] of the 
error matrix and the computational time per atom to evaluate the inverse of the overlap 
matrix for diamond and fee Al, respectively. The comparison clearly indicates that the 
computational efficiency increases in the order of the divide < Hotelling's < the recursion 
methods for both diamond and fee Al. The recursion method is about one-hundred times 
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faster than the divide method in computational time to achieve the same convergence for 
diamond and fee Al. 

IV. CONCLUSIONS 

Wc presented a new 0{N) algorithm for calculating the inverse of the overlap matrix 
S. It is based on the recursion method with the block Lanczos algorithm. The problem 
of evaluating is mapped to the block BOP method for an orthogonal TB model just 
by replacing the Hamiltonian with the overlap operator. In addition, we briefly described 
the other known-methods for calculating the inverse in 0{N) operations: the divide, the 
Taylor expansion, and Hotelling's methods. We examined the computational accuracy and 
efficiency of these O(A^) inverting methods using the 1-norm of the error matrix for diamond 
and fee Al in DFT calculations with the minimal basis set for valence electrons. The spec- 
trum radius of the O-matrix given by (5 — 1) exceeds 1.0 for many real materials in the DFT 
calculations based on the localized bases, which means that the applicability of the Taylor 
expansion method is significantly restricted. In the recursion method the 1-norm of the error 
matrix exponentially converges to the value calculated by the divide method for the same 
cluster in both diamond and fee Al with numerical stability. On the other hand, Hotelling's 
method cannot reach the converged results due to the numerical instability in both cases. 
The comparison of computational time shows that the recursion method is the most efficient 
algorithm among the four O(A^) inverting methods in diamond and fee Al. The recursion 
method is about one-hundred times faster than the divide method. Thus, the new method 
for the evaluation of the inverse is a practical algorithm and can be incorporated in several 
0(iV) methods for total energy calculations using localized orbital basis. 
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FIGURES 

FIG. 1. The density of states for eigenvalues of the O-matrix for diamond and fee Al, where 

carbon and aluminum atoms have minimal numerical basis sets for valence electrons which were 
obtained by DFT calculations for the atomic states. The experimental values, 3.57 and 4.05 A, 
were used as the lattice constants of diamond and fee Al, respectively. 

FIG. 2. The 1-norm of the error matrix for diamond calculated by the (a) recursion, (b) divide, 
and (c) Hotelling's methods. In both the recursion and Hotelling's methods, the 1-norms were 
calculated for three-, five-, and seven-shell clusters as a function of number of recursion levels and 
iterations, respectively. 

FIG. 3. The 1-norm of the error matrix for fee Al calculated by the (a) recursion, (b) divide, 
and (c) Hotelling's methods. In both the recursion and Hotelling's methods, the 1-norms were 
calculated for four- and six-shell clusters as a function of number of recursion levels and iterations, 
respectively. 

FIG. 4. The 1-norm of the error matrix for (a) diamond and (b) fee Al against the computa- 
tional time taken per atom calculated by three 0{N) inverting methods. The calculations were 
performed using single processor on a Compaq ES40 workstation. 
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